Randomization
Assumption: Under the null hypothesis, observations are random draws from a common population
d_EK <- JL |> filter(Treatment == "eyes") |> summarize(m = mean(Shift)) |> pull(m) -
JL |> filter(Treatment == "knee") |> summarize(m = mean(Shift)) |> pull(m)
d_EC <- JL |> filter(Treatment == "eyes") |> summarize(m = mean(Shift)) |> pull(m) -
JL |> filter(Treatment == "control") |> summarize(m = mean(Shift)) |> pull(m)
d_CK <- JL |> filter(Treatment == "control") |> summarize(m = mean(Shift)) |> pull(m) -
JL |> filter(Treatment == "knee") |> summarize(m = mean(Shift)) |> pull(m)
fm_lm <- anova(lm(Shift ~ Treatment, data = JL))
obsF <- fm_lm$`F value`[1]
obs <- tibble(Fstat = obsF, d_EK = d_EK,
d_EC = d_EC, d_CK = d_CK)
obs# A tibble: 1 × 4
Fstat d_EK d_EC d_CK
<dbl> <dbl> <dbl> <dbl>
1 7.29 -1.22 -1.24 0.0270
Rows: 22
Columns: 2
$ Treatment <fct> control, control, control, control, control, control, contro…
$ Shift <dbl> 0.53, 0.36, 0.20, -0.37, -0.60, -0.64, -0.68, -1.27, 0.73, 0…
Rows: 22
Columns: 2
$ Treatment <fct> control, control, control, control, control, control, contro…
$ Shift <dbl> 0.53, 0.36, 0.20, -0.37, -0.60, -0.64, -0.68, -1.27, 0.73, 0…
Shift to randomly associate each value with one of our three treatment categoriesset.seed(6445287)
niter <- 1000
rand.out <- tibble(Fstat = rep(NA,niter), d_EK = rep(NA,niter),
d_EC = rep(NA,niter), d_CK = rep(NA,niter))
rand.out[1,] <- obs
for(ii in 2:niter) {
JL.s <- JL
JL.s$Shift <- sample(JL$Shift)
d_EK <- JL.s |> filter(Treatment == "eyes") |> summarize(m = mean(Shift)) |> pull(m) -
JL.s |> filter(Treatment == "knee") |> summarize(m = mean(Shift)) |> pull(m)
d_EC <- JL.s |> filter(Treatment == "eyes") |> summarize(m = mean(Shift)) |> pull(m) -
JL.s |> filter(Treatment == "control") |> summarize(m = mean(Shift)) |> pull(m)
d_CK <- JL.s |> filter(Treatment == "control") |> summarize(m = mean(Shift)) |> pull(m) -
JL.s |> filter(Treatment == "knee") |> summarize(m = mean(Shift)) |> pull(m)
fm_lm <- anova(lm(Shift ~ Treatment, data = JL.s))
obsF <- fm_lm$`F value`[1]
rand.out[ii,] <- tibble(Fstat = obsF, d_EK = d_EK,
d_EC = d_EC, d_CK = d_CK)
}shift
\[P_e = (r + 1)/(n + 1)\] \(r\) is the number of estimates equal to or more extreme than the observed
\(n\) is the number of randomizations
sum(rand.out$Fstat >= rand.out$Fstat[1])/nrow(rand.out)
apply(rand.out, 2, function(x) sum(abs(x) >= abs(x[1]))/length(x))[1] 0.004
Fstat d_EK d_EC d_CK
0.004 0.005 0.006 0.949
Rows: 2,768
Columns: 2
$ log.move <dbl> 2.9391619, 0.5247285, -0.5960205, -0.3754210, -0.6386590, 0.5…
$ hr <int> 60, 54, 34, 37, 37, 41, 44, 55, 58, 51, 47, 42, 43, 51, 48, 5…
Call:
lm(formula = hr ~ log.move, data = bearmove)
Residuals:
Min 1Q Median 3Q Max
-52.127 -11.182 0.196 10.808 61.588
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 58.5514 0.6284 93.17 <2e-16 ***
log.move 2.7134 0.1463 18.55 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 15.68 on 2766 degrees of freedom
Multiple R-squared: 0.1106, Adjusted R-squared: 0.1103
F-statistic: 344.1 on 1 and 2766 DF, p-value: < 2.2e-16
[1] 18.54867
log.moveRows: 40
Columns: 2
$ TypeA <dbl> 196.7824, 191.7938, 205.7433, 201.5731, 210.2526, 175.1925, 186.…
$ TypeB <dbl> 207.3581, 204.8712, 193.9763, 200.8164, 212.5679, 184.3298, 209.…
Paired t-test
data: dd$TypeA and dd$TypeB
t = -0.62243, df = 39, p-value = 0.5373
alternative hypothesis: true mean difference is not equal to 0
95 percent confidence interval:
-5.411935 2.864956
sample estimates:
mean difference
-1.27349
[1] -1.27349
What is the minimal detectable P for n iterations?
| nreps | min_P | |
|---|---|---|
| 2 | 18 | 0.1111111 |
| 4 | 63 | 0.0317460 |
| 6 | 219 | 0.0091324 |
| 8 | 756 | 0.0026455 |
| 10 | 2604 | 0.0007680 |
| 12 | 8966 | 0.0002231 |
| 14 | 30865 | 0.0000648 |
| 16 | 106246 | 0.0000188 |
| 18 | 365727 | 0.0000055 |
| 20 | 1258925 | 0.0000016 |
Non-parametric tests often used when data do not meet the assumptions of a traditional (parametric) test:
Small sample size, non-normality, unequal variances
Dramatically lower power compared to a parametric test
For all practical cases, randomization is a better alternative